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Abstract 



A Baycsian procedure is developed for multivariate stochastic volatility, using state 
f-H , space models. An autoregressivc model for the log-returns is employed. We generalize 

the inverted Wishart distribution to allow for different correlation structure between the 
observation and state innovation vectors and we extend the convolution between the 
Wishart and the multivariate singular beta distribution. A multiplicative model based on 
the generalized inverted Wishart and multivariate singular beta distributions is proposed 
for the evolution of the volatility and a flexible sequential volatility updating is employed. 
The proposed algorithm for the volatility is fast and computationally cheap and it can 
be used for on-line forecasting. The methods are illustrated with an example consisting 
of foreign exchange rates data of 8 currencies. The empirical results suggest that time- 
varying correlations can be estimated efficiently, even in situations of high dimensional 
, data. 

£3 ' Some key words: volatility, multivariate, GARCH, time series, state space model, 

, Bayesian forecasting, dynamic linear model, Kalman filter, generalized Wishart distribu- 

■ tion. 
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1 Introduction 

Consider that the p-variate time series {yt}t=i ... N is generated from the multivariate state 
space model 

y t = e t + T} t l2 e t and t = 00 t _i + n] /2 uj u (1) 
where the innovations {et}t=l,...,N an d {wt}t=i jv are individually and mutually uncorrelated, 

1 /2 

following the p-variate Gaussian distributions e t ~ N p (0, I p ) and uj t ~ N p (0, I p ), for E f ' being 
the symmetric square root of Et (Gupta and Nagar, 1999) and I p denotes the p x p identity 
matrix. Typically, at time t, y t will represent the log-returns of some assets or exchange rates 
or any other financial time series. Et is the volatility matrix at time t and interest is placed 
on its estimation, while fit is a non-negative definite matrix. An evolutionary law for Et and 
a density for the initial state #o have to be defined. It is worthwhile to note that several 
volatility models can be obtained from the formulation of model ([T]). For example, for <j> = 1, 

1 /2 

0q = 9 (with probability 1), and Qt — 0, one obtains the volatility model yt = 0+E t et- Then, 
depending on the evolution law for Et, one can obtain multivariate GARCH (MGARCH) type 
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models (Bauwens et al., 2006) or multivariate stochastic volatility (MSV) models (Asai et al., 
2006; Maasoumi and McAleer, 2006). 

The purpose of this paper is to develop an estimation procedure that will allow fast and 
efficient estimation of St and forecasting of yt- Our motivation stems from work on MSV 
models that experience problems due to the simulation-based estimation procedures they 
use, see e.g. Uhlig (1997), Aguilar and West (2000), Philipov and Glickman (2006), and 
references therein. The fast estimation procedures, proposed in this paper, aim to achieve 
high computational savings (which is necessary in high dimensional data) and yet enjoy the 
sophistication of stochastic volatility. For this to commence, one needs to define fit and 
to propose an evolutionary law for S t . Since 9 t is unobserved signal, fit is suggested to be 
specified, rather than estimated from the data, as the latter will require to resort to simulation- 
based estimation techniques (e.g. MCMC or EM algorithm), and this can cause significant 
delays of the estimation of the volatility. In this paper we adopt Bayesian estimation, for 
which we aim to specify a prior distribution for So- Then it is desirable, in order to develop 
conjugate analyses that will facilitate fast estimation, to define fit to be proportional to St. 
Indeed, in the context of a time-invariant volatility St = S, this setting is well known (Harvey, 
1989; Durbin and Koopman, 2001; Liitkepohl, 2007). In this setting fit = fiSt, where fl > 

1/2 

is known. However, this is overly restrictive, as the correlation matrix of S t et is the same 
as the correlation matrix of fi^ 2 ^- In this paper we define fit — S^fiS*^ 2 ! , where now fl 
is a known non-negative definite matrix (later in the paper we explain how this matrix can 
be specified). This setting clearly encompasses the situation when fl is scalar, and it allows 
more general and more flexible estimation. 

For the volatility covariance matrix St, we propose a multiplicative stochastic law of its 
precision S^ 1 , i.e. 

ST" 1 = kU(Z-\yB t U(Zt\), t = l,...,N, (2) 

where k = {5(1 — p) + p}{5(2 — p) + p — l} -1 , for a discount factor < 5 < 1, and ^(S^) 
denotes the unique upper triangular matrix based on the Choleski decomposition of S^ 
Here Bt is a p x p random matrix following the multivariate singular beta distribution Bt ~ 
B(m/2, 1/2), where m = 5(1 — 5)" 1 + p — 1. Some details of this distribution can be found 
in the appendix (see Lemma [5]), but for more details the reader is referred to Uhlig (1994), 
Dfaz-Garcfa and Gutierrez (1997), and Srivastava (2003). In Section [3] it is shown that when 
fl = I p and with E(.) denoting expectation, we have E(S^~ 1 |?/ t_1 ) = E(S i ~3 1 |y i_1 ), while the 
respective covariance matrix at t is increased of that at time t — 1; in this case S^ 1 follows a 
random walk. When fl is a covariance matrix, then the evolution ([2]) suggests approximately 
a random- walk type process for S^ 1 . The choice of k is made in order to accommodate the 
above random walk equations. It should be noted that, if p = 1, it is k = 1/5, and (|2|) is 
reduced to St = SJ^t^iB^ 1 (Uhlig, 1994; Triantafyllopoulos, 2007). In order to accommodate 
for the definition of fit, we generalize the Wishart distribution and we extend its convolution 
with the multivariate singular beta, which was first proved in Uhlig (1994). In order to 
support conjugate analysis, this generalization is necessary, because of the definition of fit- 
Finally, at time t = 0, 8q is assumed to be uncorrelated with {et}t=i,2,...,N and {u>t}t=i,...,N 

1 /2 1/2 

and it is assumed that 9q ~ N p (rriQ, S -PoS ), for some known prior mean vector ttiq and 
covariance matrix Pq. The scalar constant <j) is assumed known. Compared with existing 
MGARCH and MSV models, a major advantage of the proposed methodology, is that the 
likelihood function is provided in closed form. This can facilitate model comparison, but it 
can also be used as a means for the choice of the parameters, without the need to resort 
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to numerical methods in order to maximize the likelihood function (more details on this are 
provided via the data analysis in Section U|) . 

The remaining of the paper is organized as follows. The following section generalizes 
the inverted Wishart distribution and discusses some properties of the new distribution. In 
SectionOthe main algorithm of the volatility is developed and Section U] analyzes the volatility 
of foreign exchange rates data. The findings of the paper are summarized in Section [5] and 
the appendix includes all proofs of arguments in Sections [2] and [3l 



2 Generalized inverted Wishart distribution 

Let X ~ IW p (n,A) denote that the matrix X follows an inverted Wishart distribution with 
n degrees of freedom and with parameter matrix A. Given A, we use the notation \A\ for the 
determinant of A and the notation etv(A) for the exponent of the trace of A. The following 
theorem introduces a new distribution generalizing the inverted Wishart distribution. 

Theorem 1. Consider the p x p random covariance matrix X and denote with X 1 ! 2 the 
symmetric square root of X. Given p x p covariance matrices A and S and a positive scalar 
n > 2p, define Y = X 1 / 2 A~ l X 1 ! 2 so that Y follows an inverted Wishart distribution Y ~ 
IW p (n, S). Then the density function of X is given by 

V 1 2P("-P- 1 )/ 2 r p {(n-p- \)/2}\X\ n l 2 y ' ' 

where T p (.) denotes the multivariate gamma function. 

The distribution of the above theorem proposes a generalization of the inverted Wishart 
distribution, since if A = I p we have X ~ IW p (n, S) and if S = I p , we have X ~ IW p (n, A). 
This is clearly a different generalization of other generalizations of the inverted Wishart 
distribution, see Dawid and Lauritzen (1993), Brown et al. (1994), Roverato (2002), and 
Carvalho and West (2007). In the following we refer to the distribution of Theorem [1] as 
generalized inverted Wishart distribution, and we write X ~ GIW p (n, A, S). The next result 
gives some expectations of the GIW distribution. 

Theorem 2. Let X ~ GIW p (n, A, S) for some known n,A and S. Then we have 

(a) ^{X 1 ' 2 S- 1 X 1 / 2 ) = {n-2p- 2)~ 1 A; ^{X^ 1 / 2 SX' 1 / 2 ) = (n - p - lJA" 1 ; 

(b) E\X\ e = 2~P e [T p {(n - p - l)/2}]~ 1 r p {(n-2t-p- l)/2}\A\ e \S\ e , 
where E(.) denotes expectation and < I < (n — 2p)/2. 

The following property reflects on the symmetry of A and S in the GIW distribution. 
Theorem 3. If X ~ GIW p (n, A, S), for some known n, A and S, then X ~ GIW p (n,S,A). 

We motivate the estimator X(A, S) of X ~ GIW p (n, A, S) as follows. The estimator 
should be a symmetric positive definite matrix and for A and S being matrices, one possibility 
is X(A,S) = kA 1 / 2 SA 1 / 2 , for a known constant k. This estimator equals the expectation of 
the inverted Wishart distribution IW p {k + 2p + 2, A 1 I 2 SA 1 / 2 ). Since in general A 1 / 2 SA 1 / 2 / 
S 1 / 2 AS 1 / 2 , a similar estimator for X can be considered as X*(A,S) = k*S l / 2 AS l l 2 , for 
some constant k* . We propose that the desired estimator for X should satisfy the following 
requirements: 
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(1) In the univariate case (p = 1) the estimator should be X(A, S) = AS/{n — 4); 

(2) The estimator should be symmetric in A and S, i.e. X(A,S) = X(S,A); 

(3) If A = I p the estimator should reduce to the expectation from the inverted Wishart 
density IW p (n,S), i.e. X(A,S) = (n — 2p — 2)~ 1 S; If S = I p the estimator should 
reduce to the expectation from the inverted Wishart density IW p (n, A), i.e. X(A, S) = 
(n-2p- 2)~ 1 A. 

Now we propose the estimator 

X(A, S) = - ( S l ' 2 AS 1 ' 2 + A^SA 1 ' 2 ) , (3) 

2n — 4p — 4 \ / 

for which we can see that (l)-(3) are satisfied. 

It is also easy to verify that if X ~ GIW p (n, A, S), then the density of Y = X^ 1 is 

\A\ (n-p-l)/2 1 ci (n-p-l)/2 ly I (n-2p-2)/2 

p{Y) = ^ , 11 — etr(- AY 1 / 2 : SY l ' 2 /2). 

Fy ' 2P( n "P~ 1 )/ 2 T p {(n - p - l)/2} v ' ' 

This distribution generalizes the Wishart distribution; we will say that Y follows the general- 
ized Wishart distribution with n—p — 1 degrees of freedom, covariance matrices A -1 and S 1-1 , 
and we will write Y ~ GW p (n—p— 1, A" 1 , S" 1 ). It is easy to see that when A = I p or S = I p , 
the above density reduces to a Wishart density. Again our terminology and notation, should 
not cause any confusion with other generalizations of the Wishart distribution, proposed in 
the literature (Letac and Massam, 2004). 

The next theorem is a generalization of the convolution of the Wishart and multivariate 
singular beta distributions (Uhlig, 1994). For some integers m,n, denote with B p (m/2,n/2) 
the multivariate singular beta distribution with m and n degrees of freedom. The density of 
this distribution is given in the appendix (see Lemma [5]) and more details can be found in 
Uhlig (1994), Diaz-Garcia and Gutierrez (1997), and Srivastava (2003). 

Theorem 4. Let p and n be positive integers and let m > p — 1. Let H ~ GW p {m + n, A, S) 
and B ~ B p (m/2,n/2) be independent, where A and S are known covariance matrices. Then 

G = U{H)'BU{H) ~ GW p (m, A, S), 

where U(H) denotes the upper triangular matrix of the Choleski decomposition of H. 



3 Estimation 

3.1 The main algorithm 

In this section we consider estimation for model ([1]), where £t follows the evolution ([2]). The 
prior distributions of 0o\T,q and So are chosen to be Gaussian and a generalized inverted 
Wishart respectively, i.e. 

#o|£ ~ iV p (m ,Ej /2 P sJ /2 ) and S ~GIW^(n ,Q- 1 ,5o), (4) 

for some known parameters mo, Pq = PqI p , uq > 2p + 2 and Sq. Q is the limit of Qt-i(l) = 
Pt-i + £1 + I p , where Pt is a known covariance matrix. The next result shows that the limit of 
Pt (and hence the limit of Qt-i(l)) exist and it provides the value of this limit as a function 
of 4> and £1. 
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Theorem 5. If P t = R t (R t + I p )~ l , with R t = 4> 2 P t -i + ft, where ft is a positive definite 
matrix and considering the prior Pq = pqI p , for a known constant po > 0, it is 

1 

p 



t->oo ~ 2(f) 2 
v-l 



{(ft + (i - (j) 2 )i p f + 4ft} 1/2 - ft - (i - <j) 2 )i p 



P = lim P t 

for<f>^0 and P = ft(ft + I p )~ L , for <p = 0. 

This result generalizes relevant limit results for the univariate random walk plus noise 
model (Anderson and Moore, 1979, page 77; Harvey, 1989, page 119). 

Let Y ~ t p (n, m, P) denote that the p-dimensional random vector Y follows a multivariate 
Student t distribution with n degrees of freedom, mean m and scale or spread matrix P (Gupta 
and Nagar, 1999, Chapter 4). The next result gives an approximate Bayesian algorithm for 
the posterior distributions of Qt and T,t as well as for the one-step forecast distribution of yt- 

Theorem 6. In the multivariate state space model |7P with evolution ^j, let the initial 
priors for #o|^o and So be specified as in equation Q). The one-step forecast and posterior 
distributions are approximately given, for each 1 < t < N , as follows: 

(a) One-step forecast at time t: S^y* -1 ~ GIW P (5(1 — 5)~ x + 2p, Q , k~ 1 St-i) and 
VtW' 1 ~ t p (5(l - 5) _1 ,mt_i,& _1 5t_i), where k = (5(1 - p) + p)(5(2 - p) + p - ly 1 
and 5, St—i, rrit-i ar e known at time t — 1. 

(b) Posteriors at t: t |E t ,?/* ~ N p (m t , Y,] /2 P t Y}J 2 ) 
and S t |y* ~ GIW((1 - 5)~ l + 2p, Q~ l ,S t ), with 

m t = mt-i + A t e t , P t = (0 2 P t _i + ft)(c/> 2 iVi + ft + Ip)- 1 , 
e t = y t -mt-i, S t = k' 1 S t -i + e t e t , 

where A t = T,] /2 'P t E7 1/2 is approximated by A* = (S* t ) l / 2 P t (S^)- 1 / 2 , with S* = 
S(Q 1 ,St) the estimator of Et|y* (see equation and Qj_i(l) = Pt~i + ft + I p 
being approximated by its limit Q = P + ft + I p , where P is given by Theorem^ 

From Theorem [6] we have that the one-step forecast vector mean and covariance matrix 
of yt are 

for 5 > 2/3. 

We note that k > 1, since 5(1 — p) + p > 5(2 — p) + p — 1, for any < 5 < 1 and so, if we 
expand St as 

t 

for large t, we can approximate St by 2~2i=i k l ~ t eie' i . The observation that k^ 1 < 1 is impor- 
tant, because otherwise St could tend to infinity. 

From Theorem El if = I p , then Stly*" 1 ~ IW p (5(l - 5)~ l + 2p,k~ 1 Q~ l S t ~i) and 
St^ily'" 1 ~ IW P ((1 — J)" 1 + 2p, Q^^-St-i), where now Q is a variance. Thus £7/ 
Wp(<5(l - 5)" 1 +p-l, kQS^j) and S^y'" 1 ~ Wp((l - J)" 1 +p - 1, QS^) so that 



E(E t - i |y t - 1 ) = ( ^— . +p - l) fcQ5 t -\ = +P - l) QS^\ = E(S^ 1 1 |y*" i ), (5) 
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with A; = (5(l-p)+p)(5(2-p)+p-l)" 1 . Fr om the Wishart densities it follows that, given 

Var{vecp(S- 1 )|y'- 1 } > Var{vecp(S- 1 )|y'- 1 }, (6) 

in the sense that Var{vecp(S^~ 1 )|y' -1 } — Var{vecp(S^T 1 )|y* -1 } is a non-negative definite ma- 
trix, where vecp(S^T 1 ) denotes the column stacking operator of EJ" 1 . Equations §5$) and ([6]) 
show that when = I p , EjT 1 follows a random walk type evolution. When $7 is a covariance 
matrix we can see that 

E(E- 1/2 QS7 1/2 |y*- 1 ) = E(E-_ 1 / 2 QE- 1 / 2 |y*- 1 ) 

and Var{vecp(E f 1//2 QE t 1 ^ 2 |y*~ 1 )} > Yar{vecp(Tj t ^[ 2 QT lt }[ 2 \y t ^ 1 }. The proof is by noting 
that S f " 1/2 QS t " 1/2 |y^ 1 - W p (8(l - 8)' 1 + p- l,kS^\) and E^QE^/V" 1 ~ W p ((l - 
5) _1 + p — 1, and following a similar argument as in equations ([5]) and ©. Hence 

when Q is a covariance matrix E t 1 ^ 2 QE t 1 ^ 2 follows a random walk type evolution and this 
motivates the adoption of evolution ([2]). Equation ([5|) shows that under the definition of k, 
the expectation of EjT 1 equals the expectation of E^l 1; while the respective variances are 
increased from time t — 1 to t. 

3.2 Performance measures 

In this section we discuss several performance measures for model ([I]). We start giving the 
log-likelihood function of T, t . 

Theorem 7. In the state space model f7]j with evolution denote with £(Ei, . . . , Ejv; y N ) 
the log-likelihood function of Ei, . . . , Ejv, based on data y N = {y±, . . . , yjv}. Then it is 

1 N 18 - 1 N 

£(X u ...,Z N ;y N ) = c --Y,e^; 1/2 Q^ 1/2 e t - T —Y, lo ^ U ^-\)\ 

t=i t=i 

N 35 -2 N 



and 



N Np 
-Np log vr - — log \Q\ - — log k 



2(1-5) ;/ p \ 2(1-5) 

where 5 > 2/3, /c = {5(1 — p) + p}{5(2 — p) + p — l} -1 and L t is the diagonal matrix with 
diagonal elements the positive eigenvalues of I p — /c~ 1 {Z^(Ej~ 1 1 ) / } _1 E^T 1 {^/(Ej~ 1 1 )}~ 1 , with 

s,- 1 = w(E t - 1 yw(E t - 1 ). 

The choice of 5, £1 and the priors mo, Po, and So can be done by either maximizing the 
log-likelihood function or optimizing performance measures, such as the mean of square one- 
step forecast errors (MSE), the mean of square standardized one-step forecast errors (MSSE), 
the mean absolute deviation (MAD), and the mean one-step forecast error (ME). The priors 
can be set using historical data, but a general guideline suggests mo = 0, po = 1000 and 
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So = Ip- I n an Y case these initial settings are not critical to the performance of the model, 
especially given plethora of data. It then remains to specify Q and 5. Given data y±, . . . , yw, 
the definition of the above mentioned performance measures are 



1 



N 



1 



MSE 



N 



E(4 




) , MSSE 



N 



EM 




t=i 



i 



JV 




MAD 



E (mod(eit) 



. . ,mod(e pt ))' 



e P t) 



where et = (ey, . . . , e p t)' is the one-step forecast error vector, mod(eyt) denotes the modulus 
of ejt (j = 1, . . . ,p) and ut = (uu, ■ ■ ■ , ttpt)' is the standardized one-step forecast error vector, 
defined by 



so that ¥j(ut\y t 1 ) = and K(ntu' t \y t 1 ) = I p . Thus, if the model is a good fit, it should 
return MSSE « (1, . . . , 1)', ME « (0, ... , 0)', while MAD and MSE should be as small as 
possible. 

The MSSE is usually preferred to MSE, because it takes into account the forecast covari- 
ance matrix of the log-returns. However, since the MSE can be used for comparison of two 
or more models it is mentioned here. When we look at the performance of a single model 
the MSSE has the ability to judge the goodness of fit in an effective way. The MAD has a 
similar performance as the MSE, while the ME is useful if we wish to check how biased is the 
estimation method (Fildes, 1992). 

In order to choose the optimal f2 we propose the following search procedure. Since O 
has p{p + l)/2 distinct elements, for relatively large p there are many elements in f2 to be 
optimized. One can reduce the dimensionality of this optimization by considering a diagonal 
choice for fi, writing = diag(u?i, . . . , uu p ). Since < W{ < co, still a search procedure for 
the optimal Wi can be time-consuming. By defining Z = + Ip) -1 , we have that Z is also 
diagonal and it is < Z < I p . This means that we can use a grid search procedure to find 
the optimal value for Z and then choose $7 = (I p — Z)~ l Z. For Z = {z\, . . . , z p )' we can use 
Zi = l/W q , . . . , (10 9 — 1)/10 9 , for i = 1, . . . ,p and q a positive integer; for most applications 
q = 2 or q = 3 will suffice. Then we can readily see that Wi = Zi/(1 — Z{), for i = 1, . . . ,p. We 
use this search procedure in the example of Section |H 

4 FX data analysis 

In this section we consider foreign exchange rates data (FX) of 8 currencies, namely Australian 
Dollar vs US Dollar (AUD/USD), British Pound vs USD (GBP/USD), Canadian Dollar vs 
USD (CAD/USD), Dutch Guilder vs USD (DUG/USD), French Franc vs USD (FRF/USD), 
German DeutschMark vs USD (GDM/USD), Japan Yen vs USD (JPY/USD), and Swiss Franc 
vs USD (SWF/USD). The data are sampled in daily frequency, from January 1980 to Decem- 
ber 1997 and these data are reported in Franses and van Dijk (2000). We form the log-returns 
vector series yt = (yu, • • • , y&t)' , where yu is the log-returns of AUD/USD, . . ., yg 4 is the log- 
returns of SWF/USD; the data are plotted in Figure [0 To specify Q we used the log- likelihood 
criterion with the search procedure of Section [3j Using q = 2, an optimal diagonal matrix Z 
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1000 2000 3000 4000 
(a) AUD/USD 





1000 2000 3000 4000 
(b) GBP/USD 



1000 2000 3000 4000 
(c) CAD/USD 




1000 2000 3000 4000 
(d) DUG/USD 




1000 2000 3000 4000 
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1000 2000 3000 4000 
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1000 2000 3000 4000 
(g) JPY/USD 



1000 2000 3000 4000 
(h) SWF/USD 



Figure 1: FX data; shown are (a) the AUD/USD exchange rate, (b) the GBP/USD rate (c) 
the CAD/USD rate, (d) the DUG/USD rate, (e) the FRF/USD rate, (f) the GDM/USD rate, 
(g) the JPY/USD rate and (h) the SWF/USD rate. 



was Z = diag(0.44, 0.54, 0.56, 0.87, 0.92, 0.52, 0.99, 0.77) and so the diagonal that maximizes 
the log-likelihood function is O = diag(0.786, 1.174, 1.272, 6.692, 11.500, 1.083, 99.000, 3.348), 
for 5 = 0.7, <p = 1, itiq = and So = Is- This setting for Q reveals a clear benefit as opposed 
to a setting $7 = wis, for a known w > 0, as we can see that the correlation matrix of ej 
and the correlation matrix of ut are not the same. Indeed, at the posterior estimate £4773 of 
S4773, we can see that the correlation matrices of et and ut differ significantly, with the latter 
having larger correlations. 

For this data set we observed that larger values of 5 (in particular values of 5 in the range 
0.9 < 6 < 0.99) can not capture the volatility shocks, returning large values for the MSSE. 
The log-likelihood function, evaluated at the posterior estimate of St, for the above optimal 
settings was —98438.78. The four performance measures are 

MSE = (0.00009,0.00002,0.00001,0.00031,0.00244,0.00026, 1.67707,0.00022)', 
MSSE = (0.933,0.911,0.980,0.979,0.948,0.940,0.924,0.916)', 
MAD = (0.006,0.003,0.003,0.013,0.036,0.012,0.915,0.011)', 
ME = (-3.14 x 10" 7 , 7.53 x 10~ 7 , -2.73 x 10~ 6 , -2.32 x 10~ 6 , -5.77 x 10" 6 , -1.82 x 10~ 6 , 

-4.67 x 10~ 4 ,-2.08 x 10~ 8 )' 

The MSSE is slightly under (1, . . . , 1)', which means that the volatilities are slightly over- 
estimated. However, looking at the MSE, MAD, the ME and the log-likelihood function, we 
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(a) AUD/USD 
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(b) GBP/USD 
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Figure 2: Estimate of the posterior volatility for the FX data. Shown are: (a) the AUD/USD 
volatility, (b) the GBP/USD volatility (c) the CAD/USD volatility (d) the DUG/USD 
volatility (e) the FRF/USD volatility (f) the GDM/USD volatility (g) the JPY/USD volatil- 
ity and (h) the SWF/USD volatility. 



consider this model as acceptable. 

For = (<Tij t t)i,j=i,...,8, Figure [2] shows the posterior volatilities of each of the au t t (i = 
1,...,8), for the last 774 observations, i.e. from t = 4000 until t = 4773. Most of the 
volatilities are small, except for the JPY/USD, but even for small volatilities Figure [2] indicates 
clearly the highly volatile periods for each exchange rate. Figure [3] shows the posterior 
correlations of GBP /USD with the other rates. This figure confirms that the correlations 
are time- varying. By inspecting Figure [3] we observe that GBP/USD is most correlated 
with DUG/USD, FRF/USD and JPY/USD, while GBP/USD is least correlated (but still 
significantly correlated) with AUD/USD and CAD/USD. 

We note that for this data set, there were 4773 time points and for the 8-dimensional time 
series {yt}, the estimation algorithm (see Theorems [6] and [7]) , implemented in R, (including 
the search procedure to maximize the log-likelihood function) took about 8 minutes to run 
on a Pentium PC. 



5 Discussion 

In this paper we have provided a Bayesian analysis for multivariate stochastic volatility. We 
propose a generalization of the Wishart and inverted Wishart distributions and we extend the 
convolution between the Wishart and the multivariate singular beta distributions. This gener- 
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Figure 3: Estimate of the posterior correlation coefficient for the FX data. Shown are: (a) 
the posterior correlation estimate of GBP/USD and AUD/USD, (b) the correlation estimate 
of GBP/USD and CAD/USD, (c) the correlation of GBP/USD and DUG/USD, (d) the 
correlation of GBP/USD and FRF/USD, (e) the correlation of GBP/USD and GDM/USD, 
(f) the correlation of GBP/USD and JPY/USD, and (g) the correlation of GBP/USD and 
SWF/USD. 



alization is motivated from the multivariate random walk plus noise model, which innovation 
vectors are desired to have different correlations. The proposed estimation methodology is 
delivered in closed form and it is fast and easily implementable, even for high dimensional 
data. The log-likelihood of the volatility is obtained in closed form and this is an important 
step forward on multivariate volatility estimation, quoting "The estimation of the canonical 
SV model and its various extensions was at one time considered difficult since the likelihood 
function of these models is not easily calculable." from Chib et al. (2007). The availability 
of the log-likelihood function in closed form allows more efficient model comparisons, e.g. 
via sequential likelihood tests or via sequential Bayes' factors (Salvador and Gargallo, 2004; 
Triantafyllopoulos, 2006). Moreover, the proposed model develops a fast Bayesian algorithm 
not depending on simulation-based estimation procedures and not requiring many parame- 
ters to be estimated. In the special case where the volatility of state vector is proportional to 
the volatility of the observation vector, the analysis is exact and the inverse of the volatility 
matrix follows a Wishart process. 

The procedure proposed in this paper attempts to combine the simplicity of non-iterative 
algorithms with the sophistication of stochastic volatility procedures. Algorithms such as 
the one developed here, are particularly attractive, because they can model high dimensional 
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data, with low computational cost, and still they can enjoy the mathematical properties of 
closed estimation procedures, which aim to address volatility estimation and forecasting for 
a wide class of financial data. 



Appendix 

Proof of Theorem^ Consider the transformation Y = X 1 / 2 S~ l X 1 / 2 . From Olkin and Rubin 
(1964) the determinant of the Jacobian matrix of X with respect to Y is J(Y — > X) = 
J(Y -» X 1 ' 2 )J(X l l 2 -fA] = Ui<j( X i + + where Ai, .... Ap are the eigenvalues 

of S~ 1 / 2 X 1 / 2 S~ 1 / 2 and £i> • • • >£p are the eigenvalues of X 1 / 2 . We observe that if A = 7 P , 
then is an inverted Wishart distribution, since tr(— X" 1 / 2 SX" 1 / 2 /2) = tx{—SX~ x /2). 

The Jacobian J(Y — > X) does not depend on .A and so we can determine J(Y — > X) from 
the special case of A = I p . With A = I p , X ~ IW p (n, S) and 1" ~ IW p (n,I p ) and from the 
transformation Y = X 1 / 2 S'~ 1 X 1/ ' 2 we get 

, , |SK n -P- 1 )/ 2 etr(-Y- 1 /2)J(Y -> X) 
p[Y) - 



2 Kn-p-i)/2r p {(n - p - 1)/2}|S|"/ 2 



Since Y ~ IW p (n,I p ) it must be \S\~ n / 2 \S\^ n ~ p ^/ 2 J{Y -> X) = 1 and so J(Y -> X) = 

|5|(p+l)/2_ 

Now, in the general case of a covariance matrix A, we see 

P W dX = / P (n- P -l)/2r JY IWOIIVI^ ^ 7 ^ = 

x>o Jy>o 2 p(n p 1)/2 T p {(n-p - l)/2}\Y\ n / 2 

since Y ~ IW p (n, A). □ 

Proof of Theorem 0. First we prove (a) . From the proof of Theorem Q] we have that 

Y = X l / 2 S~ 1 X 1 / 2 ~ JW p (n,A) and so E(Y) = (ri-2p-2)~ 1 A and E(Y~ 1 ) = (n-p-l)A- 1 . 

Proceeding with (b) we note from the proof of Theorem Q] that for any n > 2p 

I |Xr n / 2 etr(-AX^ 1 /2 (SX -i/2/ 2 ) dx = c -i ) 
Jx>o 

where c is the normalizing constant of the distribution of X. Then 



E 

where 



X f = c [ |X|-("- 2 ^/ 2 etr(-^X- 1 / 2 5X- 1 / 2 /2) dX = -, 
Jx>o c* 

C = 2P( n ~P~ 1 )/ 2 \A\ i \S\ i T p {{n-2e-p- l)/2} 

and the range of ^ makes sure that n — 21 > 2p. The result follows by eliminating the factor 
2P(n-p-l)/2 in the f raction c / c *. □ 

Proof of Theorem^ Suppose that X ~ GIW p (n, A, S). From the normalizing constant of the 
density f(X) of Theorem[Tl we can exchange the roles of \A\ and \S\. And from tr(— AX^^S 
xX- x / 2 /2) = tr(-5X- 1 /2^x- 1 / 2 /2) we have that X ~ G/W^n, S, A). □ 
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In order to prove Theorem [J] we prove the somewhat more general result in the following 
lemma. 



Lemma 1. Let Ay ~ W p (m,I p ), A 2 = YTj=i Y j Y j ', with Y j ~ N p (0,I p ) and H ~ GW p (m + 
n,A,S), where A\, Yj (j = l,...,n) and H are independent. Define C = A\ + A 2 , B = 
{UiCyj^AiiUiC)}" 1 , G = U{H)'BU{H) and D = H l l 2 AH 1 / 2 - G^AG 1 / 2 . Then C ~ 
W p (m + n, I p ), G ~ N p (0, S), where C, G and Zj (j = 1, . . . , n), are independent. 

Proof. The proof mimics the proof of Uhlig (1994). Define Zj = U(H^ 2 AH 1 / 2 )' '{U(C)'}- l Yj 
and note that D = ^2j = iZjZj. From Theorem [1] and from Uhlig (1994), the Jacobian 
J(A 1 ,H,Y 1 ,...,Y n -f C,G,Z u ...,Z n ) is \H\~ n l 2 \C\ n / 2 \A\~^ +l )/ 2 . Then, the joint density 
function of A\ , H, A 2 can be written as 

p(A 1 ,H,A 2 )=p(A 1 )p(H)p(A 2 ) = {2^/ 2 r p (m/2)}" 1 etr(- J 4 1 /2)| J 4 1 |( m -P- 1 )/ 2 

x 2 pim+n ^ 2 T p {{m + n)/2}\S\ {m+n)/2 ] ~* \A\ {m+n ^ 2 ! etr(- AH 1 ' 2 S' 1 H 1 ' 2 /2)\H\^ m+n - p -^l 2 
x (27r)-P n / 2 etr(-A 2 /2)|A|-(P +1 ) ( dAi)(dff)(dYi) • • • ( dY n ) 
= "2 p ( m+n )/ 2 r p {(m + n)/2}j~ 1 etr(-C7/2)|C| {m+n - p ~ 1)/2 

x {2 pm/2 r p (m/2)| ( S| m/2 }" 1 |,4r /2 etr(- J 4G 1/2 5" 1 G 1/2 /2)|G| (m " ?, ~ 1)/2 

x ( 27r )-W2| 5 pn/2 etr( _ 5 -l jD/2) | j4 |(n- P -l)/2 = p ( C )p(G)p(D), 

where = |C||P|, B X I 2 AH X I 2 = G 1 ' 2 AG l l 2 + D and \H\ = \G\/\B\ are used. □ 

Proof of Theorem^ The proof is immediate from Lemma [1] after noticing that with the 
definition of the multivariate singular beta distribution (Uhlig, 1994), B ~ B p (m/2,n/2). □ 

Let A > denote that the matrix A is positive definite and let A > B denote that the 
matrices A > and B > satisfy A — B > 0. The following two lemmas are needed in order 
to prove the limit of Theorem 

Lemma 2. If the p x p matrices A,B>0 satisfy A > B, then A^ 1 < B^ 1 . 

The proof of this lemma is given in Horn and Johnson (1999). 

Lemma 3. If Pt = Rt(Rt+I p ) , with Rt = (p 2 Pt-i + Q, where Q is a positive definite matrix 
and (j) is a real number, then the sequence of p x p positive matrices {Pt} is convergent. 

Proof. First suppose that <p = 0. Then Rt = Q, for all t, and so Pt = 0(J7 + I p ) , which of 
course is convergent. 

Suppose now that 0^0. It suffices to prove that {Pt} is bounded and monotonic. Clearly, 
< Pt and since cp 2 > and Q is positive definite < Pt, for all t > 0. Since (Rt + Ip)- 1 > 0, 
(Rt + I P - Rt)(Rt + Ip)" 1 > => P t = R t (R t + Ip)" 1 < I P and so < P t < I p . For 
the monotonicity it suffices to prove that, if i^— l ^ (equivalent P f ~\ < Pt~- 2 ), then 

P^ 1 > P t z\ (equivalent Pf 1 < P t Z_\). From P~\ > Pf_\ we have P t -x < P t -2 Rt < 
R t _ 1 J?- 1 > P t _1 - Pf_\ = R' 1 - R~\ > 0, since Pf 1 = (R t + Ip)^ 1 = I p + R^ 1 ■ 

With an analogous argument we have that, if P t ~\ < Pt~} 2 , then Pf 1 — P t 1i < 0, from which 
the monotonicity follows. □ 
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Lemma 4. Let {Pt} be the sequence of Lemma{3\ and suppose that Pq = poI p , for a known 
constant po > 0. Then, with ft as in Lemma\^ the limiting matrix P = lim^oo Pt commutes 
with ft. 

Proof. First we prove that if Pt-i commutes with ft, then Pt also commutes with ft. Indeed 
from P t = (0 2 iVi + ft)0 2 iVi + ft + I p )~ l we have that Pf 1 = I p + (0 2 P_i + ft)" 1 and 
then 

p-ijr 1 = ir 1 + (</> 2 niVi + ft 2 )- 1 = ft- 1 + (<^ 2 p t _ift + ft 2 ) -1 = ft- 1 ^- 1 

which implies that ftP^ = (P i _1 ft- 1 )- 1 = (ft^P^T 1 )- 1 = P t ft and so Pt and ft commute. 
Because Po = Polp-, Pq commutes with ft and so by induction it follows that the sequence of 
matrices {Pj, t > 0} commutes with ft. Since P = lim^oo P t exists (Lemma [3]) we have 

Pft = lim (P t ft) = lim (ftP t ) = ftP 

t— >oo t— >oo 

and so P commutes with ft. □ 

Proof of Theorem [5l From Lemma [3] we have that P exists and from Lemma |4] we have that 
P and ft commute. From P t = (0 2 P_i + ft)(P_i + ft + Ip)- 1 we have P = (> 2 P + ft)(> 2 P + 
ft + ip)- 1 from which we get the equation P 2 + 0" 2 P(ft + I p — 4> 2 L P ) — (/)~ 2 ft = 0. Now since 
P and ft commute we can write 

p 2 + <j)' 2 p(n + i p - 4> 2 i p ) - <p- 2 n = o => p 2 + ^ p ( n + (! - 
p + ^L ( ft + (i - ^ )Ip) ^ 2 = _L ( ^ + (i - <j?)i p f + ft 

{ (ft + (1 - 2 )/p) 2 + 4ft} 1/2 - ft - (1 - </> 2 )j 



no 



after rejecting the negative definite root. □ 

Proof of Theorem^ The proof is inductive in the distribution of X(|y'. Assume that given 
y*- 1 the distribution of E t _i is St-ily* -1 ~ GIW((l-6)~ 1 +2p, Q" 1 , 5 t _i) and so S^Jy*- 1 - 
GWp((l - <5) _1 +p- 1, Q, 5^). From the evolution fl2]) and TheoremES we have S^ 1 ^* -1 
GWp((5(l - fc^-\), which proves that Stly'" 1 ~ G/Wp(S(l - J)" 1 , Q" 1 , fc^St-i). 

From the Kalman filter, conditionally on S t , the one-step forecast density of is 

yti W 1 ~ ^(mt-i,E t 1/2 Qt-i(l)E t 1/2 ) « iV P (m t _i, S t 1/2 QS, 1/2 ), 

where mt-i, Qt_i(l) and Q are as in the theorem. 
Given y t_1 the joint distribution of yt and is 

t-l\ IV „.t-lWV L.t-l-\ 



p(yt,^t\y ) = p(yt\^t,y )p(£ t \y 



n etr{-Q- 1 S f - 1/2 (e f e^ + fc- 1 5,_ 1 )S f " 1/2 /2} 

Cl |EJ(n+i)/2 > ( A - X ) 
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where n = 5(1 — 5) 1 + 2p and 

|fc- 1 S t _i|( rl -P- 1 )/2 



(2^) 7r / 2 2f("-P- 1 )|Q|( n -P)/ 2 r p {(n-p- l)/2}' 
The one-step forecast density of yt is 

= ci f |E t |-( n+1 >/ 2 etr{-Q- 1 S t - 1/2 (e t _ieJ_ 1 + fc- 1 E t )S t - 1/2 /2}dS t 
2P("-p)/ 2 r p {(n-p)/2} 



ci 



= r p {(n 2 P+P )/2} | fc -l 5 |(n-2 P+P -l)/2| / l5 r ( n _ 2p+p )/ 2 

^P/ 2 r p {(n-2p + p- l)/2}' 1 1 

and so ~ t p (5(l — 6) ,mt-i,k St-i), as required. This completes (a). 

Proceeding with (b) first we derive the distribution of St|y . Applying the Bayes' theorem 
we have 

, v i t \ p(^l s t>y* _1 )p( s t|y <_1 ) 

p{E,}y > = — — 

and from equation (|A-1|) we have 

p(E^) = c 2 |S t |-"*/ 2 etr(-Q- 1 S7 1/2 S t S7 1/2 /2) 

and 

5 1 

n* = n + 1 = r + 2p + 1 = + 2p, 

l—o l—o 

where St is as in the theorem and the proportionality constant is c 2 = c\ /p(yt\yt-i) , not 
depending on E*. Thus E^y* ~ GIW P ((1 — 5)" 1 + 2p, Q ,St) as required. Conditionally 
on Et, the distribution of #t follows directly from application of the Kalman filter and so 
applying the approximation Ej ~ S|, with S% as in the theorem, provides the required 
posterior distribution of 9 t . □ 

Before we prove Theorem [71 we give the following lemma. 

Lemma 5. Suppose that the pxp matrix B follows the singular multivariate beta distribution 
B ~ B p (m/2, n/2), with density 

v(B) = (n 2 -pn)/2 r p {(m + n)/2} (n _ p _ 1)/2| |(m _ p _ 1)/2 
n ' r n (n/2)r p (m/2)' 1 11 

where n is a positive integer, m > p — 1, I p — B = HiKH[, K is the diagonal matrix with 
diagonal elements the positive eigenvalues of I p — B, and H\ is a matrix with orthogonal 
columns, i.e. HiH[ = I p . For any non-singular matrix A, the density of X = AB~ X A! , is 

V (X) = J^-Pr 1 )/ 2 V p{\ m + n )/ 2 } I ^n+m-p-l\ L l-(p-n+l)/2\ y|-(m-p-l)/2 
V ^ r n (n/2)r p (m/2)' 1 11 11 

where L is the diagonal matrix including the positive eigenvalues of I p — A'X~ 1 A. 
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Proof. First note that X is a non-singular matrix and \B\ = |^4| 2 |X| . From Diaz-Garcfa 
and Gutierrez (1997), the Jacobian of B with respect to X is 

(dB) = |K| (p - n+1)/2 |L|-( p - n+1 )/Vr(^), 

where K is defined as in the theorem. Then from the singular multivariate beta density of B 
we obtain 

V (X) = t (" 2 -p»)/2 r p {(m + n)/2} 1|n|Ji:| ( w - p -i)/2|ff|( m -p-i)/2 
K ' r n (n/2)r p (m/2)' 111 11 

x |_ K -i(p-n+l)/2i£ ( |-(p-n+l)/2 ) 

from which we immediately get the required density of X. □ 
Proof of Theorem^ The likelihood function is 

iV 

LCS!,...,!!^^) =p(y 1 \Z 1 ) P (Z 1 \Z )]]p(yt\Z t ,y t - 1 )p(Z t \^ 1 ,y t - 1 ) (A-2) 

t=2 

and from the Kalman filter we have yjjEj,?/*" 1 ~ N p (mt-i, E^QSj' 2 ), where Qt_i(l) ~ Q. 
The density p(E t |E t _ x , y*" 1 ) is the density of Lemma [5J with A = fe~ 1 / 2 {W(S^_ 1 1 )~ 1 }', 

II^" 1 = i/(S ; T 1 ) / Z^(i; ; r 1 ), m = 5(1 — + p — 1 and n = 1. The required formula of the log- 
likelihood function is obtained from (|A-2j) by taking the logarithm of L (Si, ... , Sjv; y )■ D 



References 

[1] Aguilar, O. and West, M. (2000) Bayesian dynamic factor models and portfolio allocation. 
Journal of Business and Economic Statistics, 18, 338-357. 

[2] Anderson, B.D.O. and Moore, J.B. (1979) Optimal Filtering. Prentice Hall, Englewood 
Cliffs NJ. 

[3] Asai, M,, McAleer, M. and Yu, J. (2006) Multivariate stochastic volatility: A review. 
Econometric Reviews, 25, 145-175. 

[4] Bauwens, L., Laurent, S. and Rombouts, J.V.K. (2006) Multivariate GARCH models: A 
survey. Journal of Applied Econometrics, 21, 79-109. 

[5] Brown, P.J., Le, N.D. and Zidek, J.V. (1994) Inference for a covariance matrix, in: P.R. 
Freeman and A.F.M. Smith, eds. Aspects of Uncertainty. Wiley, Chichester. 

[6] Carvalho, CM. and West, M. (2007) Dynamic matrix-variate graphical models. Bayesian 
Analysis, 2, 69-98. 

[7] Chib, S., Omori, Y. and Asai, M. (2007) Multivariate stochas- 
tic volatility. CIRJE Discussion Paper F-488 (permanent website: 
|http : / / www . e . u-tokyo . ac . jp/cir j e/r esearch/03research02dp . html) . 

[8] Dawid, A. P. and Lauritzen, S.L. (1993) Hyper Markov laws in the statistical analysis of 
decomposable graphical models. Annals of Statistics, 21, 1272-1317. 



15 



[9] Dfaz-Garcfa, J. A. and Gutierrez, J.R. (1997) Proof of the conjectures of H. Uhlig on the 
singular multivariate beta and the jacobian of a certain matrix transformation. Annals of 
Statistics, 25, 2018-2023. 

[10] Durbin, J. and Koopman, S.J. (2001) Time Series Analysis by State Space Methods. 
Oxford University Press, Oxford. 

[11] Fildes, R. (1992) The evaluation of extrapolative forecasting methods. International 
Journal of Forecasting, 8, 69-80. 

[12] Franses, P.H. and van Dijk, D. (2000) Nonlinear Time Series Models in Empirical Fi- 
nance. Cambridge University Press, Cambridge. 

[13] Gupta, A.K. and Nagar, D.K. (1999) Matrix Variate Distributions. Chapman and Hall, 
New York. 

[14] Harvey, A.C. (1989) Forecasting Structural Time Series Models and the Kalman Filter. 
Cambridge University Press, Cambridge. 

[15] Horn, R.A. and Johnson, C.R. (1999) Matrix Analysis. Cambridge University Press, 
Cambridge. 

[16] Letac, G. and Massam, H. (2004) All invariant moments of the Wishart distribution. 
Scandinavian Journal of Statistics, 31, 295-318. 

[17] Liitkepohl, H. (2007) New Introduction to Multiple Time Series Analysis. Springer- 
Verlag, New- York. 

[18] Maasoumi, E. and McAleer, M. (2006) Multivariate stochastic volatility: An overview. 
Econometric Reviews, 25, 139-144. 

[19] Olkin, I. and Rubin, H. (1964) Multivariate beta distributions and independence prop- 
erties of Wishart distribution. Annals of Mathematical Statistics, 35, 261-269. 

[20] Philipov, A. and Glickman, M.E. (2006) Multivariate stochastic volatility via Wishart 
processes. Journal of Business and Economic Statistics, 24, 313-328. 

[21] Roverato, A. (2002) Hyper inverse Wishart distribution for non-decomposable graphs and 
its application to Bayesian inference for Gaussian graphical models. Scandinavian Journal 
of Statistics, 29, 391411. 

[22] Salvador, M. and Gargallo, P. (2004). Automatic monitoring and intervention in multi- 
variate dynamic linear models. Computational Statistics and Data Analysis, 47, 401-431. 

[23] Srivastava, M.S. (2003) Singular Wishart and multivariate beta distributions. Annals of 
Statistics, 31, 1537-1560. 

[24] Triantafyllopoulos, K. (2006) Multivariate control charts based on Bayesian state space 
models. Quality and Reliability Engineering International, 22, 693-707. 

[25] Triantafyllopoulos, K. (2007) Feedback quality adjustment with Bayesian state-space 
models. Applied Stochastic Models in Business and Industry, 23, 145-156. 



16 



[26] Uhlig, H. (1994) On singular Wishart and singular multivariate beta distributions. Annals 
of Statistics, 22, 395-405. 

[27] Uhlig, H. (1997) Bayesian vector autoregressions with stochastic volatility. Econometrica, 
65, 59-73. 



17 



